Branch History Matching: Branch Predictor Warmup for Sampled Simulation
نویسندگان
چکیده
Computer architects and designers rely heavily on simulation. The downside of simulation is that it is very time-consuming — simulating an industry-standard benchmark on today’s fastest machines and simulators takes several weeks. A practical solution to the simulation problem is sampling. Sampled simulation selects a number of sampling units out of a complete program execution and only simulates those sampling units in detail. An important problem with sampling however is the microarchitecture state at the beginning of each sampling unit. Large hardware structures such as caches and branch predictors suffer most from unknown hardware state. Although a great body of work exists on cache state warmup, very little work has been done on branch predictor warmup. This paper proposes Branch History Matching (BHM) for accurate branch predictor warmup during sampled simulation. The idea is to build a distribution for each sampling unit of how far one needs to go in the pre-sampling unit in order to find the same static branch with a similar global and local history as the branch instance appearing in the sampling unit. Those distributions are then used to determine where to start the warmup phase for each sampling unit for a given total warmup length budget. Using SPEC CPU2000 integer benchmarks, we show that BHM is substantially more efficient than fixed-length warmup in terms of warmup length for the same accuracy. Or reverse, BHM is substantially more accurate than fixed-length warmup for the same warmup budget.
منابع مشابه
Memory reference reuse latency: Accelerated warmup for sampled microarchitecture simulation
Copyright c 2003 IEEE. Published in the Proceedings of the 2003 International Symposium on Performance Analysis of Systems and Software (ISPASS), March 2003, Austin, Texas. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists...
متن کاملMemory Reference Reuse Latency: Accelerated Sampled Microarchitecture Simulation
This paper explores techniques for speeding up sampled microprocessor simulations by exploiting the observation that of the memory references that precede a sample, references that occur nearest to the sample are more likely to be germane during the sample itself. This means that accurately warming up simulated cache and branch predictor state only requires that a subset of the memory reference...
متن کاملContext-based Branch Prediction for High-Performance and Low-Power Computing
Advanced processors can simultaneously execute multiple instructions in parallel to achieve performance. Branches introduce control dependence between instructions. Branch prediction therefore is important to modern processors. Most present predictors use branch history to predict branch outcomes. Using branch history alone results in delay for identifying mispredictions. In this paper, a conte...
متن کاملApplications of Machine Learning Techniques to Systems
Perceptrons is a simple neural network that works as an alternative to the commonly used two-bit counters branch history table (BHT) branch predictor. Perceptrons achieves increased accuracy than traditional BHT branch predictors because they can make use of longer branch histories, given the same hardware budget. Although having very similar organization to BHT branch predictors, Perceptrons’ ...
متن کاملA Branch Predictor with New Recovery Mechanism
To improve the performance of wide-issue superscalar processors, it is essential to increase the instruction fetch and issue rate. Removal of control hazard has been put forward as a significant new source of instruction level parallelism for superscalar processors and the conditional branch prediction is an important technique for improving processor performance. Branch mispredictions waste a ...
متن کامل